Use triton wheel no fork#2959
Merged
Merged
Conversation
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Update build-triton job to first attempt downloading a pre-built wheel from rocm.frameworks-nightlies.amd.com, falling back to source build only when the download fails. Also bump TRITON_COMMIT to d1660454. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Update wheel URL format from triton-3.7.0+amd.git<commit> to triton-3.7.0+rocm7.2.0.git<commit> to match the actual naming convention on the nightly server. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The server returns 403 when '+' is used literally in the URL. Percent-encode it as %2B while keeping the local filename with '+'. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The wheel files are under gfx942-gfx950/ not gfx942-gfx950/triton/. The triton/ subdirectory is a PEP 503 index page whose links point to ../ (the parent directory). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
- Add requirements-triton.txt with --extra-index-url for AMD PyPI - Add pip install -r requirements-triton.txt in build_aiter_triton.sh - Remove build-triton job from triton-test.yaml, use BUILD_TRITON=0 - Update README.md with Triton installation instructions Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Relax atol/rtol to 0.1 for bfloat16 due to lower precision (7-bit mantissa). Add fullgraph=True to enforce full graph compilation without eager fallback. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
- Change dtype params to [float16, bfloat16] across all torch_compile tests - Add torch._dynamo.reset() to prevent recompile limit with fullgraph=True - Relax tolerance for bf16 in fused_mul_add and activation tests (atol=0.1) Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Move triton dependency from a separate requirements-triton.txt (using AMD PyPI index) to the standard amd-triton package on PyPI, added as both a build and runtime dependency. This simplifies installation by making `pip install -e .` handle triton automatically. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
amd-triton is now available on PyPI directly, so the extra index URL for AMD PyPI is no longer needed. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
If amd-triton is not yet installed, pip uninstall returns non-zero which would abort setup.py. The reinstall call is kept as check_call to ensure amd-triton is always installed with the latest content. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Use requirements-triton.txt for triton installation instead of embedding it in pyproject.toml/setup.py. The file now references amd-triton from PyPI directly, no extra index URL needed. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Replace requirements-triton.txt with inline ROCm version detection in setup.py and CI script. Uninstall all conflicting triton packages (triton, pytorch-triton, pytorch-triton-rocm, triton-rocm, amd-triton) before installing amd-triton with the correct --extra-index-url based on the detected ROCm version. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Add ROCm version detection and amd-triton installation to atom-test, vllm_benchmark, and sglang_downstream workflows before pip install -e . Wrap amd-triton install in setup.py with try/except to avoid build failure in PEP 517 isolated environments where pip is unavailable. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Consolidate duplicated ROCm version detection and amd-triton installation logic into .github/scripts/install_triton.sh. Update all CI workflows (build_aiter_triton, atom-test, vllm_benchmark, sglang_downstream) and README to call the shared script. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Replace inline ROCm version detection and amd-triton install code in setup.py with a call to the shared install_triton.sh script. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…nstall amd-triton in aiter-test CI - Move _get_compiled into torch_compile/__init__.py so all test files import from a single location - Add FILE_TIMES for the 10 torch_compile tests to split_tests.sh - Add Install amd-triton step in aiter-test.yaml for standard and multi-gpu test jobs Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
7d3a4db to
6bdadaa
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Contributor
|
I've checked the new test files, LGTM! However, I don't have enough CI knowledge to comment on the scripts and workflows. Let's wait for an approaval from AITER CI team. |
Contributor
|
Failures in Flash Attention Integration jobs are being addressed in #2695. |
Contributor
|
Added |
Contributor
however, this description is outdated and only show one point, please change accordingly. |
valarLip
approved these changes
Apr 30, 2026
gyohuangxin
pushed a commit
that referenced
this pull request
May 3, 2026
…2985) PR #2959 introduced .github/scripts/install_triton.sh and added an "Install amd-triton" step to aiter-test.yaml that calls the script inside the docker container. The container's working directory is the PR's checkout, so any PR opened or last synced before #2959 landed on main does not contain the script and fails with: bash: line 1: ./.github/scripts/install_triton.sh: No such file ##[error]Process completed with exit code 127. This blocks Standard Tests on every stale PR (e.g. #2969, all 9/10 shards failing), forcing authors to rebase just to get green CI. Fix: in the Install amd-triton step, fall back to fetching the script from the base ref via raw.githubusercontent.com when it is not present in the runner workspace. Workflow files for PR events always come from the base branch, so this stays consistent with the rest of the CI flow and adds no security boundary crossing. Applied symmetrically to the Standard Tests (1 GPU) and Multi-GPU Tests (8 GPU) jobs. atom-test.yaml and sglang_downstream.yaml also call the script after a fresh git clone of the PR sha and would benefit from a similar fallback in a follow-up.
chun-wan
pushed a commit
that referenced
this pull request
May 4, 2026
…2985) PR #2959 introduced .github/scripts/install_triton.sh and added an "Install amd-triton" step to aiter-test.yaml that calls the script inside the docker container. The container's working directory is the PR's checkout, so any PR opened or last synced before #2959 landed on main does not contain the script and fails with: bash: line 1: ./.github/scripts/install_triton.sh: No such file ##[error]Process completed with exit code 127. This blocks Standard Tests on every stale PR (e.g. #2969, all 9/10 shards failing), forcing authors to rebase just to get green CI. Fix: in the Install amd-triton step, fall back to fetching the script from the base ref via raw.githubusercontent.com when it is not present in the runner workspace. Workflow files for PR events always come from the base branch, so this stays consistent with the rest of the CI flow and adds no security boundary crossing. Applied symmetrically to the Standard Tests (1 GPU) and Multi-GPU Tests (8 GPU) jobs. atom-test.yaml and sglang_downstream.yaml also call the script after a fresh git clone of the PR sha and would benefit from a similar fallback in a follow-up.
Liang-jianhao97
pushed a commit
that referenced
this pull request
May 7, 2026
Currently triton is either built from source in CI (build-triton job) or relies on whatever version is pre-installed in the Docker base image. This is slow, fragile, and inconsistent across workflows. AMD now publishes pre-built amd-triton wheels on ROCm 7.0, ROCm 7.1, and ROCm 7.2, making source builds unnecessary. This PR centralizes triton installation into a single shared script that auto-detects the ROCm version and installs the matching amd-triton wheel, ensuring all CI workflows and local development use the same triton distribution. --------- Co-authored-by: Claude Opus 4 <noreply@anthropic.com>
Liang-jianhao97
pushed a commit
that referenced
this pull request
May 7, 2026
…2985) PR #2959 introduced .github/scripts/install_triton.sh and added an "Install amd-triton" step to aiter-test.yaml that calls the script inside the docker container. The container's working directory is the PR's checkout, so any PR opened or last synced before #2959 landed on main does not contain the script and fails with: bash: line 1: ./.github/scripts/install_triton.sh: No such file ##[error]Process completed with exit code 127. This blocks Standard Tests on every stale PR (e.g. #2969, all 9/10 shards failing), forcing authors to rebase just to get green CI. Fix: in the Install amd-triton step, fall back to fetching the script from the base ref via raw.githubusercontent.com when it is not present in the runner workspace. Workflow files for PR events always come from the base branch, so this stays consistent with the rest of the CI flow and adds no security boundary crossing. Applied symmetrically to the Standard Tests (1 GPU) and Multi-GPU Tests (8 GPU) jobs. atom-test.yaml and sglang_downstream.yaml also call the script after a fresh git clone of the PR sha and would benefit from a similar fallback in a follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Currently triton is either built from source in CI (
build-tritonjob) or relies on whatever version is pre-installed in the Docker base image. This is slow, fragile, and inconsistent across workflows. AMD now publishes pre-builtamd-tritonwheels on ROCm 7.0, ROCm 7.1, and ROCm 7.2, making source builds unnecessary. This PR centralizes triton installation into a single shared script that auto-detects the ROCm version and installs the matchingamd-tritonwheel, ensuring all CI workflows and local development use the same triton distribution.Additionally, there were no tests verifying that triton operators work correctly with
torch.compile. This PR adds 10 torch.compile compatibility tests to catch regressions early.Technical Details
Test Plan
Test Result
All can pass
Submission Checklist